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Abstract 

Background: Metabolic engineering aims to design microorganisms that will generate a product of interest at high yield. 
Thus, a variety of in silico modeling strategies has been applied successfully, including the concepts of elementary flux 
modes (EFMs) and constrained minimal cut sets (cMCSs). The EFMs (minimal, steady state pathways through the system) can 
be calculated given a metabolic model. cMCSs are sets of reaction deletions in such a network that will allow desired 
pathways to survive and disable undesired ones (e.g., those with low product secretion or low growth rates). Grouping the 
modes into desired and undesired categories had to be done manually until now. 

Results: Although the optimal solution for a given set of pathways will always be found with the currently available tools, 
manual selection may lead to a sub-optimal solution with respect to a metabolic engineering target. A small change in the 
selection of modes can reduce the number of necessary deletions while only slightly reducing production. Based on our 
recently introduced formulation of cut set calculations using binary linear programming, we suggest an algorithm that does 
not require manual selection of the desired pathways. 

Conc/usions:\Ne demonstrated the principle of our algorithm with the help of a small toy network and applied it to a model 
of E. coli using different design objectives. Furthermore we validated our method by reproducing previously obtained 
results without requiring manual grouping of modes. 
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Introduction 

Microorganisms are increasingly used as cell factories to 
produce a multitude of chemicals and promise great potential 
for many future applications [1-4]. Microorganisms provide many 
benefits as production hosts: (i) production of substances that are 
basically inaccessible to classical chemical synthesis (e.g., proteins 
with specific glycosylation patterns), (ii) production of substances in 
a cheaper and more environment friendly way (cheap educts, 
processes at room temperature, no need for heavy metal catalysts, 
fewer by-products), and (iii) production of bulk chemicals from 
renewable resources instead of petroleum-based feedstocks. A 
multitude of (genetically engineered) microorganisms are currently 
used in industry. There are a few examples where microorganisms 
naturally produce a product of interest with sufficient yield, but it 
is generally necessary to genetically engineer strains to obtain the 
desired properties. These genetic interventions may lead to 
optimized channeling of metabolic fluxes towards the product of 
interest and/or introduce non-native pathways to enable produc- 
tion of foreign components [5-7]. Designing such strains may 
occasionally be straightforward, such as overexpressing a gene 
involved in the pathway leading to the desired product. However, 



multiple, non-intuitive genetic interventions are possible and often 
more effective due to the high connectivity of metabolites 
(particularly in redox- and energy-metabolism). Thus, a systems 
biological analysis approach is needed that considers the metabolic 
network structure. 

Methods based on constraint-based reconstruction and analysis 
(COBRA) have been used successfully to predict complex 
intervention strategies in metabolic engineering. COBRA methods 
analyze the steady state behavior of an organism using its 
stoichiometric matrix as the main input. The stoichiometric 
matrix is a comprehensive, organism-specific collection of the 
stoichiometry of the biochemical reactions occurring in the 
organism of interest. Frameworks such as OptKnock [8] and 
RobustKnock [9] allow predicting genetic interventions that 
optimize host production capabilities. Both methods couple a 
biologically motivated objective (typically maximization of biomass 
production) with an engineering objective (e.g., maximize product 
yield). As these methods rely on an optimization principle to 
describe cellular behavior, they are considered biased [10]. 

Alternatively, elementary flux mode (EFM) analysis [11,12] can 
be used to provide an unbiased view on the steady state capabilities 
of an organism [13]. An EFM is a minimal steady state pathway 
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Figure 1. Simple metabolic toy network. The network consists of seven metabolites and seven irreversible reactions. We assume that 
metabolites A, B, and C are in steady state. Metabolites S, BM, P, and Q are not subject to the steady state assumption, as they are external 
metabolites. 

doi:1 0.1 371 /journal.pone.0092583.g001 



through a metabolic network [14,15]. Minimal in this context 
means that removing any one reaction participating in the 
pathway will block any steady state flux through it. Calculating 
the EFMs is computationally expensive [16], as their number 
increases exponentially with the number of reactions and is 
currently limited to small and medium-scale models [17]. Such 
models may already lead to several hundred million EFMs, but 
their size is sufficient to describe core metabolism (glycolysis, 
pentose phosphate pathway and citrate cycle) and pathways 
involved with the product of interest. 

One important property of EFMs is that every feasible flux 
distribution in the network can be described as a non-negative, 
linear combination of EFMs. This suggests that the entire 
metabolic space of the system can be represented by the full set 
of EFMs. Consequendy, an optimal production host can be 
designed if EFMs with unfavorable properties, e.g., low produc- 
tivity, are removed, while favorable modes with high product yield 
are maintained. An undesirable EFM can be removed easily if one 
of its participating reactions is deleted. Notice that this strategy 
does not rely on any biologically motivated objective in contrast to 
OptKnock [8] and RobustKnock [9] but only utilizes an 
engineering objective: Find a set of reaction deletions that will 
restrict the cell to desirable metabolic states only. 

The concept of removing unwanted modes under the condition 
that certain modes have to "survive" the intervention is called 
constrained minimal cut sets (cMCSs) [18]. When the EFMs are 
known and classified into desirable and undesirable modes, not 



only one but all possible MCSs leading to a desired state can be 
calculated, which usually offers different options for biological 
implementation. Successful examples include [12,19-22]. 

In this study, we address the tedious necessity of manually 
selecting modes that should be kept or disabled. cMCSs are 
dependent on allocation of the modes, and it is possible that a 
"better" design (e.g., with fewer deletions) could be found if only 
the allocation of the modes is just slighdy changed (possibly leading 
to marginally worse production). Here we present the formulation 
of a binary integer program (BIP) for calculating cMCSs such that 
manual mode selection is no longer necessary. 

Methods 

Introductory Example 

Consider the system depicted in Figure 1 . The network utilizes 
the substrate S to produce biomass (BM), a product of interest (P), 
and a by-product (Q). We define the product yield, Ypig as rate of 
product formation per substrate uptake rate. Furthermore we 
define the substrate-specific productivity (SSP) as product yield 
times specific growth rate (biomass flux/substrate uptake flux) 
[23]. We are interested in finding a genetic intervention strategy 
which allows efficient production of P. To identify desirable 
network states we performed an EFM analysis on the network. 
The EFMs are listed in Table 1. EFM3 exhibits the maximum 
product yield, F P / S = 0.5. To optimize production we therefore 
considered all other modes to be undesirable as they have smaller 



Table 1. Elementary flux modes for the example network in Figure 1. 
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Depicted are the product yield Y x /s for biomass (BM), product (P), and side product (Q) as well as the mode's substrate specific productivity, r] P : = ^bm/s x Yp/s< f° r 
each mode. 
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Figure 2. Phenotypic space of the example network in Figure 1 for different reaction deletions. The y-axis depicts the normalized 
product yield (R6+R7), whereas the x-axis shows the normalized specific growth rate (R3). Each dot represents an elementary flux mode (EFM). EFMs 
are color coded with respect to i; P . The top left panel shows the available space for the unperturbed network. The top right panel shows the 
"optimal" phenotypic space where only the mode with the highest product yield for the production of P (EFM3, see Table 1 ) is present. Such a design 
can be realized by deleting at least R2 and R6 or R2, R4, and R5. The remaining panels show different phenotypic spaces for different single reaction 
deletions (see label in each panel). 
doi:1 0.1 371 /journal.pone.0092583.g002 



product yields (see Table 1 and top left panel in Figure 2). Based 
on this categorization we set up an intervention which disables all 
undesirable functionality in the network (i.e. EFM1, EFM2, and 
EFM4) while keeping EFM3 operational. The desired design 
requires at least two deletions (R2 and R6). Additionally, a second 



MCS with three deletions (R2, R4, and R5) results in the same 
design. However, we may ask if it is possible to reduce the number 
of deletions if we (slighdy) change the partitioning of the EFMs. If 
so, what is the best way to re-categorize the EFMs? For example, if 
we considered EFM2, EFM3, and EFM4 to be desirable (as their 
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Table 2. Result of our algorithm when applied to the toy model in Figure 1 (optimization for Y P / S ). 
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Column A contains the number of deletions; MCS lists the deleted reaction(s) for this MCS; EFMs contains the surviving EFMs (numbers correspond to Table 1 ), and z is 
the minimal value for the product yield, y p/s, in the system. 
doi:1 0.1 371 /journal.pone.0092583.t002 



Jp/S^O.4) and only EFM1 to be undesirable (as it does not 
produce any product), we would require only a single deletion (R2, 
see Figure 2). Alternatively, we may ask what is the "best" 
achievable design with just one deletion. Here, best is meant with 
respect to our design objective, e.g., maximizing the minimal 
product yield Tp m i n . 

Theory 

We have utilized a BIP to calculate cMCSs [24]. In this 
formulation an EFM i is represented by a binary vector b,. (6^ = 1 
if there is a flux through the j'" 1 reaction of mode i, otherwise 
V i = 0.) Similarly, a cut set is represented by the binary vector x. 
(x J = 1 means that reaction j is not affected, whereas zero means it 
is knocked out.) To check if an EFM is hit by a cut set, we 
calculated the dot-product between b, and x. If b^x = ||b,|| then 
EFM i is not cut by x as none of the reactions contributing to EFM 
lis affected [24]. If a cut set hits EFM / then b^x<||b,|| - 1. In this 
case EFM i is removed from the metabolic capabilities of the 
network due to the property of minimality. Only one contributing 
reaction needs to be deleted to render a steady state flux through 
this mode infeasible. These two conditions can be used to set up an 
optimization problem, where x is maximized in such a way that all 
desired EFMs obey the former condition, while all undesired 
EFMs are subject to the latter constraint [24]. This approach 
requires a manual partition of the EFMs into desirable and 
undesirable modes. 

To avoid this manual partitioning we assign each EFM i a 
weight w'. For example, we can use the product yield of each 
mode as its weight. In metabolic engineering we are interested in 
maximum product yield. To achieve this we typically couple 
product formation to growth. That is, we want obligatory 
production of the product of interest at any growth rate. Thus, 
similar to the RobustFvnock approach [9], we search for an 
intervention strategy that selects modes such that the minimal 
yield of all modes (i.e., minimal weight of all modes) contributing 
to the final design will be maximized. This can be formalized 
mathematically in a BIP as follows: 

maxzmfa (la) 
s.t. ww(l-/) + wy >znm, ie{l,..,m}, (lb) 



bi r x>llb i ||y, (lc) 



bTx<||b,||(l+/)-l, (Id) 

l|x|| = ||x 0 ||-A, (le) 

l|y||>i> (if) 

w = (u' 1 ,...,u'"') T , M/ ! 'dRVi, (lg) 

x = (x 1 ,...,x") T , x J e{0,l}\/j, (lh) 

y =(/,... ,/») T , y e {0,l}Vi. (li) 



where m denotes the total number of modes in the unperturbed 
network, A is the number of required reaction deletions and 
||xo||=« represents the total number of reactions in the system. 
The binary variable y' indicates whether or not EFM i is selected 
for the final design. Due to the constraint (lc), EFM i is kept if 
y = l. Otherwise constraint (Id) guarantees that mode i is 
removed. When mode i is deleted it does not contribute to the 
maximization problem, as constraint (lb) simplifies to w m3LX >z mm , 
which is always an upper bound if we choose w max = max (w). 
Finally, constraint (If) requires that each design consist of at least 
one EFM. 

We can find alternate solutions to the optimization problem 
equation (1) if we exclude any previous solution k by the inequality 
[25]. 

(l-x /f ) T x>l, (2) 
where 1 denotes an all-one vector. 
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Figure 3. Phenotypic space for the £. coli model. Each circle represents one or more modes (indicated by size) with the respective flux value for 
ethanol secretion and biomass production, both normalized to glucose uptake. Color coding represents efficiency, defined as the product of ethanol 
yield (R_ETOHT2r/R_GLCpts) and specific growth rate (R_BIOt/R_GLCpts). The boxes envelop the solution space obtained for OptKnock [8] and the 
best solution for each deletion obtained by our algorithm (3-8 deletions). Note that while OptKnock gives different solutions for each number of 
deletions, the phenotypic space of these solutions remains the same. 
doi:1 0.1 371 /journal.pone.0092583.g003 



The BIP in equation (1) calculates cut sets that maximize z m j n 
for any fixed Af. However, these cut sets may not necessarily be 
MCS. To obtain only the "best" MCS, we solve the system 
consisting of equation (1) and equation (2) repeatedly starting with 
A = 1 . Additionally we require that at each iteration, k, 

Zmin.k IS 

either better or equal to the z m ; n j £ _i of the previous iteration 
h— 1. If we do not find any other solution, we increase A and start 
the cycle again until we reach the desired Af . Note that equation 
(2) not only excludes previous solutions but also eliminates higher 
order cut sets (i.e., cut sets that are supersets of the already 
calculated MCS). Thus, this procedure only yields MCSs. 

Finally, the conventional BIP formulation for cMCS [25], 
where modes are classified manually, is recovered if we assign a 
weight of one to all desired modes and a weight of zero to all 
undesirable modes. 

To illustrate our algorithm we applied it to the toy network 
depicted in Figure 1 and optimized for F P / S . The results are 
depicted in Table 2. 

Implementation 

To solve the BIP, we use the IBM ILOG CPLEX Optimizer, 
http:/ /www.ibm.com/ software/integration/optimization/cplex- 
optimizer, for which free academic licenses are available. We used 
the specialized CPLEX' feature populate [26] to speed up the 
calculation of alternate optima, which allows for efficient 
generation of multiple solutions with one function call. Our 
implementation of the algorithm in C is available from the authors 
on request. 

Results 

In the previous section we described our approach to determine 
cMCSs without preselecting modes. Now we apply our method to 
optimize anaerobic ethanol production from glucose in E. coli 
using the metabolic model by Trinh et al. [20] . Under these 



conditions, the model contains 47 metabolites and 59 reactions 
that give rise to 5,010 EFMs. 

Optimization for Substrate Specific Productivity (SSP) 

We aimed to design the "most efficient" ethanol producing E. 
coli strain. Here, efficiency, rjp, is understood to be synonymous to 
the SSP and defined for each EFM as the product of its ethanol 
yield, FEtOH/s> times its growth rate (biomass yield, Jbm/s): each 
normalized with respect to glucose-uptake. It is useful to use 
weights that depend on both product yield and growth rate to 
optimize the time-space yield of the fermentation process. We use 
the efficiency as weights in our analysis and maximized the 
minimum efficiency of the engineered E. coli as function of the 
required reaction deletions using our approach (see Table 3 and 
Figure 3). 

We found that the minimum efficiency differed from zero only 
for three and more deletions. That is, it is impossible to allow for 
up to two deletions and not to include an inefficient (i.e. »?p = 0) 
mode in the set of desired modes. Zero efficiency modes produce 
either no product or no biomass or neither. 

At least 10 deletions are required to reach the largest possible 
minimum efficiency. At this stage only four EFMs with identical 
overall stoichiometry survive the intervention. The following 
solutions lead to the same optimum and only restrict the solution 
space further (four surviving modes to one surviving mode, see 
Table 3). However, the maximum number of surviving modes did 
not decrease with increasing cardinality of the MCSs. While the 
decrease can be observed as a general trend, there were two 
exceptions at A = 6 and A = 8. In contrast, the minimum number 
of surviving EFMs monotonically decreased with cardinality of the 
MCSs. 

On our machine (2 CPUs: Intel Xeon X5650 2.67 GHz (six 
cores each), OS: Ubuntu 12.04) it took about 30 minutes to 
calculate aU 37828 MCSs listed in Table 3 using 10 threads. 

Figure 3 depicts a projection on biomass and ethanol flux of the 
total phenotypic space for the E. coli model used in our 
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Figure 4. Results for efficiency optimization. Ethanol secretion 
per glucose uptake (lower panel), biomass production per glucose 
uptake (middle panel) and efficiency (ethanol/glucose times biomass/ 
glucose) (top panel). Our algorithm shows a continuous increase in 
minimum efficiency with the number of deletions, but does not reach 
the theoretical optimum. 
doi:1 0.1 371 /journal.pone.0092583.g004 

calculations. All 5,010 modes are represented by circles according 
to ethanol yield (normalized to glucose uptake) and specific growth 
rate (normalized to glucose uptake). Modes with identical values 
for both were grouped. The size of the circles corresponds to the 
number of modes included in these groups. The solution spaces for 
our method are depicted as colored polygons on the right hand 



side of the figure, showing the increase in minimal efficiency with 
each step (see also Table 3). For comparison we show the 
OptKnock [8] solution space (Figure 3, red dashed-dotted line). 
Note that although the EFM spectra may differ at each additional 
knockout, the overall solution space computed by OptKnock is the 
same for all solutions with three to eight deletions. Only the 
deletion of a ninth reaction leads to a state where minimal specific 
ethanol production is always above zero independent of growth 
rate. 

Figure 4 shows the span for each maximum and minimum value 
(ethanol flux, biomass production flux and efficiency) obtained by 
our algorithm. It clearly shows that the minimum efficiency 
calculated by our method increases with each increase in the 
number of deletions. 

Optimization for Efficiency with Inclusion of "Essential" 
Modes 

We have now optimized for efficiency without any consider- 
ation of cellular maintenance. Maintenance requirements can be 
included in our algorithm in the form of "essential" modes. 
Essential EFMs are modes that must be included in the final design 
(either all of them or at least n of them) independent of the 
objective function or any weighting. Thus, essential modes must 
remain unaffected by the cMCS. The enhanced algorithm can be 
found in Supporting Information SI. 

We chose the experimentally verified design used by Trinh et al. 
[20] to demonstrate the principle. The design by these authors 
consisted of 12 EFMs, eight of which had zero growth but 
maximum ethanol flux and two of the eight provided maintenance 
energy. We used our enhanced algorithm, optimized for efficiency 
and excluded those eight zero-efficiency modes from the analysis. 
That is, we considered those eight EFMs to be essential of which at 
least one had to remain in the final, engineered solution space. 
The smallest solution we found had five deletions (see Figure 5, top 
panel). We also found a solution with six deletions (see Figure 5, 
bottom panel), which consisted of the four most efficient modes 
plus all eight "essential" modes. This design corresponded to the 
one used by Trinh et al. [20]. Note that Trinh et al. [20] used seven 
deletions for their design. However, it was already pointed out in 
[18,24] that the minimal number of deletions for this layout is six. 
Note that we manually selected the eight EFMs with the highest 
ethanol production (two of which produce maintenance energy) to 
be essential for simplicity. However, we detected all maintenance 
modes (or any desired subset thereof) automatically and populated 
the set of essential modes with them. 

Importandy, we achieved the same result when optimizing for 
ethanol yield using equation (1) (data not shown). In that case we 
used the mode's ethanol yield as weights, whereas a selection of 
"essential" modes was not necessary. The definition of the design 
objective (i.e., maximization of minimal ethanol secretion) suffices 
to recover Trinh et al.'s design. 

Discussion 

cMCSs have recentiy been introduced to predict minimal 
intervention strategies for the rational design of cell factories 
[18,24]. Desirable and undesirable network states are identified 
based on an EFM analysis, and a cMCS problem is set up, which 
can be efficiendy solved [25]. However, the predicted MCSs will 
obviously depend on the categorization of the EFMs. In this report 
we presented a modified approach based on BIP, which avoids the 
necessity of grouping the modes. Instead, the selection of modes 
and the calculation of MCSs is automatically and optimally 
regulated with respect to a user defined objective. Here, we used 
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Figure 5. Metabolic space for optimizing the efficiency with the condition of keeping at least one 
[located at (0,2)]. Best result for five deletions (top) and six deletions (bottom). 
doi:1 0.1 371 /journal.pone.0092583.g005 



'essential" mode of eight 



maximizing the minimum SSP, i/p, as the design criterion, among 
others. Maximizing the minimum of an objective is reminiscent of 
the RobustKnock approach [9], which aims for a strict growth 
coupling of byproduct formation. That is, maximizing the 
minimum guarantees that the product of interest is formed 
independent of growth rate. This optimization strategy is in 
contrast to OptKnock [8] and similar methods, which aim to 
maximize production but do not account for possible competing 
pathways. The different optimization strategies explain why we do 
not see a change in the available overall solution space of 
OptKnock over a wide range of the number of deletions compared 



to the results of our approach presented here (see Figure 3). 
However, our approach is indifferent to the nature of the 
objective. We can formulate a similar BIP using the OptKnock 
objective as well. However, our method is a single level 
optimization problem, whereas those of OptKnock and Robust- 
Knock are bi-level optimization problems. We based our 
optimization on a preceding EFM analysis, which characterized 
the complete phenotypic space, whereas OptKnock and Robust- 
Knock sample the phenotypic space with a second, inner 
optimization problem using an additional, biologically motivated 
objective. 
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The main advantage of our reformulation of the BIP in 
comparison to previous work [18,24] is use of a user-defined 
objective, which avoids the need to manually select desired and 
undesirable network states. Suppose we identified 12 desirable 
EFMs, while the rest was undesirable (see Figure 5, bottom panel), 
calculated all MCS, and found that at least six deletions were 
required. Is it possible to further reduce the number of knockouts if 
we could reclassify the modes? Our approach was able to answer 
this question (see Figure 5, top panel). Because we optimize the 
partitioning of the EFMs based on a linear optimization principle 
to reach the optimum, we ensured that no better solution exists. 
However, alternate solutions may exist and can be calculated by 
applying equation (2). In principle we can address the same 
problem with the conventional cMCS formulation [18,24]. 

We select modes, calculate all MCSs, reclassify the modes 
naively, repeat the analysis and check if the cardinality of these 
new MCS is smaller than the previously calculated ones. In the 
worst case we would have to check every possible classification of 
modes, which is computationally exhaustive if we use the 
conventional cMCS formulation. However, our reformulation 
achieves the same thing but is computationally more efficient. 
Although we only used model-intrinsic values as weights in this 
study (the ethanol yield and the SSP, we are free to choose any 
weights. Any user-defined distribution of weights can be used, for 
example to gradually favor a group of modes over others, without 
imposing a strict "desired" or "undesired" criterion. In contrast, 
we recover the conventional formulation by assigning binary 
weights to all desirable and undesirable EFMs [24]. To capture 
both aspects, the strict partitioning and the favoring of modes, our 
modified approach also allows for "essential" modes. These are 
desirable modes that are obligatorily included in the engineered 
design. Modes that provide maintenance energy could be potential 
candidates as essential modes in a design. Rather than using 
essential modes it is always an option to use a more general 
objective. Either way we were able to reproduce the experimen- 
tally implemented results by Trinh et al. [20]. 

Our analysis on maximizing the minimal ethanol yield revealed 
that total metabolic capabilities, as measured by the maximum 
number of surviving EFMs, did not monotonically decrease with 
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